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INTRODUCTION 



An Overview of Proiect Develop mental Continuity (PDC) 

The Office of Child Development originated Project 
Developmental Continuity (PDC) in 1974 as a Head Start 
demonstration program "aimed at promoting greater continuity 
of education and comprehensive child development services 
for children as they make the transition from preschool 
to school." The single most important effect of this under^ 
taking, it is hoped, will h% to enhance the social competence 
of the children served -^--that is, to increase their everyday 
effectiveness in dealing with their environment (at school, 
at home, in^ the community, and in society). Additional 
effects are expected in the areas of parent involvement, 
teacher attitudes, and institutional change. 

As part of the overall Head Start Improvement and 
Innovation effort, PDC emphasizes the involvement of adminxs-^ 
trators, classroom staff, and parents in formulating educa- 
tional goals and developing a comprehensive curriculum. 
The o>4ect of this effort is to ensure that children receive 
con^nuoys individualized attention as they progress from 
Head Sta^t through the early primary grades. Existing 
discontinuities between Head Start and elementary school 
experiences will be reduced, if the program is successful, 
by PDC mechanisms which encourage conmiunication and mutual 
decision-making among preschool and elGmentary school teachers 
administrators, and parents. 

School organizations at fifteen sites around the country 
received CCD funding during 1974-1975 (Program Year I) to 
desian and ulan implementation of the seven prescribed 
components of PDC. The components focus respectively on: 

m coordination of curriculum approaches and educational 
goals; 

m parent participation in policy-making, home-school 
activities, and classroom visits or volunteering; 
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m coinprahensivc services (medical, nutritional and 
social) to children and faniilies^ 

m pjeservice and inservicc teacher training and child" 
rearlnq training for parents^ 

m programs for bilingual/bicultural or multicultural 
children • 

m services for handicapped children and children with 
learning disabilities ; 

m administrative coordination between and within Head 
Start and elementary school. 



Pur poses of th e PDC Evaluation 

The major purpose of the PDC evaluation is to aid the 
Office of Child Development in its efforts to design effective 
programs for early childhood etiucation. To accomplish this, 
the evaluation will ultimately have to provide answers to the 
following critical questions about PDC * s impacts 

# How does PDC affect children's social competence? 

m How does PDC affect the school organization in 

terms of philosophy, m.ethods, and social climate? 

m How does PDC affect parents? 

• How does PDC affect the attitudes and v/orkstyles 
of teachers and other staff? 

In addition to describing the consequences of PDC, the 
evaluation will describe and analyze the processes that led 
to those consequences. Figure 1 illustrates the proportions 
of the total evaluation effort that are devoted to each 
component of the study. Although the assessment of child 
social competence is very important and is emphasized in the 
present report, the relationship of this to the rest of the . 
evaluation should not be neglected. Part B of Interim Report 
III delineates the process evaluation more fully; it is 
sufficient to emphasize here that the aims of the total 
evaluation are to produce conclusions about what happened 
(impact) and how and why it happened (process) . This information 
will facilitate future decisions about whether the program 
^should be replicated, and if so, how replication can best be 
accomplished in the liqht of past experience. 
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Figure 1 

?DC EVALUATION EFFORT 
(Total 3-Year Study} 
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Pur poses of th is Repor t 

The Dresent Program Year II, has been resorvoa 

as a time' for sites to try out and refine the program strateqiOH 
thev devGloped during theplanning year. There has so tar^^ 
been no mGasurement at all of proqram impact, and there wiii 
be none until 1976-1977, by which time the sites will have 
had a full year to pilot-test their strategies. During 
1975^1976, the evaluation methodology also is being pilot- 
tested, and this report addresses questions about three 
issues fundamental to the integrity of the future evaluation^ 

1. Are the measuring instruments appropriate to 
the task? 

2. Are the children in PDC schools and those in 
comparison schools really comparable? 

3. Are enough children available in PDC and comparison 
schools at each site to permit a longitudinal study 
of program effects? 

This executive summary presents the major tindings from, 
Interim Report III^^. 'The methods followed in seeking answers 
to these questions are described in Chapter il. Chapter III 
presents the findings: sample characteristics, reliability 
and validity analyses (question 1). comparability analysos 
(Question 2) . and data related to sample size and attrition 
(question 3). Conclusions and recommendations for continuing 
the evaluation are presented in Chapter IV. 



A Process Evaluation of Project Developmental 



Interim Report" llj^ Part "aV " Status of the Impac t 
Hiqh/Scope Foundation , March 1976 , 
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METHbDS OF THE EVALUATION - 

Instrument Selection Procedure^ 

The two major objectives of the impact Study, as set 
by the Office of Child Development (RFP-4-75-HEW-OS) , are 
to assess the impact of PDC on the development of .social 
competence in children and to assess the program's impact 
on preschools, schools, and other community organizations 
and groups. Accordingly, evidence of PDC ' s effects will be 
sought in four domains: . ' 

• the social competence of children, 

• the attitudes and behavior of teachers, adminis- 
trators, and PDC staff, 

• the attitudes and behavior of parents, 

• the structure and operation of the school and 
community organizations. ^ 

For reasons of economy and credibility, the/'decision was 
made to rely wherever possible on established instruments and 
procedure^ for measurement of program impact rather than to 
undertake the long, expensive, And uncertain process o£ 
instrument development. The search for existing measu^|s 
was guided by standard criteria related to technical and 
practical characteristics of the instruments. 

The measures selected for pilot-testing in fall 1975 
included individual child tests, teacher^administered ratings, 
and- classroom observations. These instruments were designed 
S assess chiild social competence in the areas , of social=emotiona 
development, |psychomotor skills, and cognitive and language 

1 

iFor a more Lmplete review of the procedure,^ see Interim Report 
II . Part B:' Recommendations for meaa urino Program impact.. 
High/Scope Foundation, June 1975. 
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development. A list of all the measures is included in 
Appendix A of this report. Two additional measures are 
being pilot^tested in spring 1976 and analysis of the 
spring data will lead to final recommendations for the 
child measurement battery. 



Data Collection PrQcedures 

Local testers were recruited from the PDC commiinities 
and were trained by High/Scope Foundation staff at a seven^ 
day training conference in Michigan* In most cases ^ local 
PDC' project staff participated in screening of initial 
applicants to assure the selection of testers compatible with 
and acceptable to the local programs . Tester training provided 
extensive practice in administering each of the child measures. 
Careful monitoring , of each tester was carried out during 
training so that each tester achieved criterion^level perfor-- 
mance on each of the measures before the end of the training 
session* Bilingual testers from the bilingual/bicultural 
demonstration sites were trained by bilingual staff * A 
system of on-site monitoring was established so that each 
tester at a site was responsible for monitoring each of the ^ 
others on a weekly basis'. In addition, testing did not begin 
at any site until all of the testers were monitored and found 
capable by one of . the High/Scope trainers. 

Data collection began the week of September 22 in four 
sites and the following week in the remaining sites. The 
length of data collection ranged from eight to fourteen weeks,- 
with an^ average of ten weeks being required to observe and 
test all the children. 

Data collection began with the classroom observations. 
This permitted a greater opportunity for children to baetfme 
familiar with the testers before being taken out of class for 
individual testing. Testing was generally accomplished in 
two separata sessions-^^not on consecutive days, but less than 
ten days apart. In the bilingual/bicultural demonstration 
sites an additional session was required. Assignment of 
testers to children was made so that data collection in the 
PDC and comparison programs would progress in parallel* In 
addition^ each tester was assigned both PDC and comparison 
children to avoid confounding group and tester effects* 
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Before tests were sent tb the High/Scope Foundation for 
processing, the local site coordinator (one of the testers 
selected for this task) reviewed all protocols using a 
checklist of potential problems for each test. Completed 
test protocols were sent to the High/Scope Foundation at the 
end of each week. 



Data Analysis Pr ocedure 

The baseline analyses o'f the child measures i used in- 
Project Developmental Continuity were aimed at: 1) being 
able to determine the comparability of PDC and comparison 
groups within each site, and 2) determining the adequacy 
of the measures for use in the lohgitudinai evaluation. 
In. order to accomplish these two objectives, the measures 
had to. be shown to have acceptable levels of reliability 
and validity. The analyses included four major steps, as 
indicated in Figure B-1, (Appendix B) : reliability analyses 
'validity analyses, comparability analyses, and aggregation 
of' data across sites. The details of these four steps are 
described below. 



Step 1: Are the Me asures Reliable? 

The procedures followed for the determination of 
reliability' of measures are pictured in Figure B-2. Relia- 
bility was determined for each measure within each site. 
To be considered adequately reliable, a measure had to have 
a Cronbach alpha of at least .65. If a measure had an 
initial alpha of .80, no further reliability criteria were 
applied. If the measure did not initially obtain a Cronbach 
alpha of .80, item response distributions^ and item-total 

iThe 'procedures described in this section refer to all child 
measures except the PDC Classroom Observation System and the 
PDC Child Rating' Scale. • , " 

2Many items on" the tests are constructed so that either almost 
all children will pass the item, or almost all children will 
fail the item. The easy items are included in the test to 

.familiarize the child with the test, and the difficult items 
are included to allow for the determination of the upper 
bounds of children's abilities. Such' items have little variance, 
' and hence lower the magnitude of any reliability estimate. It 
the alpha was- recomputed without these items m order to obtain 
a more accurate estimate of the measure's reliability, the 

' items were still included i'n the score for the measure. . 



correlations^ were inspected. Items were eliminated from 
the scale if they appeared to be lowering the internal^ , 
consistency, and the alpha was recomputed on the modified 
set of items. If the original or recomputed alpha for the 
measure was over ,65 for at least 10 sites and over , 55 for 
the temaining sites, the measure was considered reliable 
for all sites. If the alpha was over .65 for fewer than 
10 sites, it was considered reliable for only those sites, 
(In all cases, if any reliability estimates were less than 
,,65, an effort was made to improve the administration 
prQcedures for future testing periods,} 

Step 2^ Are the Measures Valid? 

The procedures followed for the determination of 
validity of measures are pictured in Figure As with 

reliability, the literature indicates to some eKtent the 
validity of the measures. But the validity of th^e measures 
also'^needed to be ascertained within the context of the PDC 
evaluation. Most of the measures were selected ,from larger'' 
existing batt^ies, and^^tems on most of the^ measures have 
been modi fied , both to meet the needs of the sample being - 
tasted and for use by paraprof essional testers . ' Therefore , 
the validity of the measures within the PDC environment, 
and within the test battery in which they are administered, 
needed to be ascertained. The concern within this report 
is with concurrent validity, the correlation with other _ 
measures of the same construct as well as with measures 
of other constructs, A measure should correlate highly 
with other measures of the same construct, should correlate 
moderately with, measures of similar constructs, and should 
not correlate at all with measures of independent constructs 

An hypothesized correlation matrix was corstructed, 
based on the constructs the measures were selected to 
measure. The values in the matrix indicate the level of 
relationship that theoretically should obtain between the , 
meas.ures if they are valid measures of the constructs. The 
actual correlations (within sites) were then evaluated 
against the hypothesized correlations. 



Items with low item^total correlations appear not to be 
measuring the same construct as the rest of the measure. 
Items excluded for this reason (r^^ < .30) were to be 
eliminated thereaf t^-r from the instrument if the rest of 
the instrument proved acceptable - 
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The hypothesized correlation matrix was constructed 
by d£5termining first the correlations within the three 
areas of child tests; that is, within Cognitive-Language 
measures, within Psychomotor measures, and within Social- 
Emotional measures. Then the desired correlations were 
determined between the three groups of tests. Generally, 
higher correlations were expected within an area than 
between areas. But each area is composed of sub-constructs, 
so very few high correlations were expected. 

The actual correlations between measures (the, ones 
found reliable) ■ were calculated within each site, and the, 
following procedure used to determine whether a given 
measure wasvalid. First, the obtained intercorrelation 
matrix was compared with the hypothesized matrix of Table 4 
and deviations of each correlation from the hypothesized one 
were Calculated (e.g., if the hypothesized correlation^^was 
"medium" and the obtained was "low" a deviation of "-1' 
was scored; if the hypothesized correlation was "zero" and 
the obtained was "medium," a deviation of ." + 2" was scored). 
For each measure, the absolute values of the deviation 
were summed across all measures and divided by the number 
of measures. If this ratio had a value of 1.0 or less, 
the measure was considered valid. In effect this procedure 
says that a measure is considered to have adequate concurrent 
validity if, on the average, the obtained correlations with 
other measures are within the range adjacent to the expected 
value. 

Step 3: Are the Groups r nmparable Within Sites? 

The two preceding steps in the. analysis established 
which measures were useful for study of PDC ' s effects on 
children. The next task was to determine the actual com- 
parability of PDC and comparison groups. The two groups in 
each site were compared on a number of demographic variables 
and on the performance measures found valid within that site. 
For every variable, all available data entered into a test 
of the equality of PDC vs. comparison group status. For 
categorical data (on ethnicity, for example) the equality o£ 
PDC-comparison group proportions was evaluated by means of 
the chi-square statistic; for metric data (all the test 
scores) , eq<iality of group means was determined by t tests. 
The criterion of significance for each statistical test was 
a probability value of less than ,10. 
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step 4; Are the Groups Comparable Across Sites? 



After completion of Step 3, data were aggregated 
across sites for the subset of children who had no missing 
data on seven selected performance measures. This procedure 
was determined on a post-hoc basis in response to questions 
raised by the within^site analysis of group comparability. 
Data for' the cross-site PDC and comparison group aggregations 
were analyzed in the same manner as that described in Step 3, 
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■ III 

FINDINGS 



Desc riptive Characteristics of the Samples 

Data were collected for 1179 children in the 14 Project 
Developmental Continuity sites which were in their first 
operational year in 1975-76. In each site an attempt was 
made to observe and test 30 to 45 PDC and 30 to 45 comparison 
children (except in Georgia, where 118 elenientary children 
serve as the comparison group). Table 1 shows the actual 
number of children for whom any data were collected in 
each site ("Number in Full Sample"), as well as information 
regarding ethnicity and dominant language of those children, 
in California and Texas the PDC and comparison groups are , j 
further divided into English- and Spanish-dominant children, 
since the samples will be divided in that manner for the 
remainder of the e\Ealuation. Note that in most instances 
the sample size of the PDC and comparison group is ■ greater 
than 30, with the exception of Arizona and Florida (and of 
California and Texas when split by language), 

: Children were eliminated from the analytic sample 
(used for evaluation of the measures and for testing com- 
parability of groups) if they were identified as being 
handicapped or as having a dominant language other than 
English in the non-bilingual/bicultural sites; other than 
English or Spanish in- California, Colorado, and Texas; or 
other than English or Navajo in Arizona. Handicapped children 
are and will be included in some aspects of the evaluation, 
but are excluded. for most aspects of this report. 

The final column in Table 1 shows the number of children 
in each site and group who are included in the analytic 
sample for this report, a total of 959 children. 
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Table 1 

Descriptive Characteristics of the 
Samples for Pall 1975 Data 
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Reliability and Validity of the Instruments 



Estimates of reliability and validity were already ^ 
available for most of the instruments selected for use in 
this evaluation and these estimates were used as one of the 
abases for selection of measures. In addition, it is necessary 
to Gstablish the usefulness of these measures for the par-^ 
ticuiar populations of children within each PDC site. An 
estimate of reliability for each measure for each site was 
based on internal consistency, calculated as Cronbach's 
alpha. validity for, each m^easure for each site was assessed 
by comparing the obtained intercorrelations between measures 
with an hypothesized set of ^ concurrent validity correlations. 



Reli ability of the instruments 

An instrument was accepted as reliable for a site if 
the final internal consistency coefficient (Cronbach's alpha) 
for the measure was greater than .65. In cases where the 
initial alpha was less than .80, efforts were made to 
refine the scoring procedure in such a way as to increase the. 
magnitude of the coefficient. The refinement procedure 
involved determining whether the alpha might be suppressed 
due to (1) a large percentage of the re|ponses to any item 
falling into just one sco'ring category (meaning the item was 
too easy or too difficult or irrelevant) , or (2) presence of 
any item that v/as clearly unrelated to the rest of the items 
on the measure. If either condition was discovered, the 
Item in question was deleted from the scale and the alpha was 
recalculated. Items deleted from alpha calculations for 
being tbo "hard" are still to be retained in the measure as 
administered; this is to allow for improved performance as 
the children mature. Items deleted from a scale due to low 
correlation with other items were to be deleted entirely' 
from the measure if the remainder of the measure proved 
reliable^ but this situation did not occur. 

Based on the original and recomputed estimates of Internal 
consistency, a decision was made regarding the reliability of 
each measure for" each site. Those decisions are summarized 
in Table 2. The shaded portions of the table indicate that 
the measure was not administered at that site. 



^Note that the Stephens-Delys Reinforcement Contingency Inter^ 
view does not appear in this or subsequent tables. The measu3 
was a-dministered" at eight sites, but because probes and 
responses were not complete enough, scoring proved impossible, 
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Sunmiary of Reliability and Validity Decisions 
by Measure and by Site for Fall 1975 Data 
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^or four sites. Arm Coordination and Leg Coordination were added to form one scale in ' 
order to improve internal consistency, 

either PIPS score since the scales contained only one item. Validity wa. no 
for the PIPS-LOCUS of Control since no hypothesized values were stated. 
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TablG 2 shows that most instruments were reliable in 
GVery site at vvhich they wore administered. The excep tions 
are these \ 



ay and Tell was reliable at none of the sites; 
no subset of items could be found whi^ch would 
yield an acceptable level of reliability. 

• Block Building was reliable at nona of the sites; 
nearly all of the children tested passed two of 
the four items and a substantial percentage passed 
a third item. It appears that this measure is not 
appropriate to the age level of children in this 
evaluation . 

• Conceptual Groupinq was reliable in three sites 
' only. 

• The Verbal Memory measure achieved acceptable alphas 
when the items were divided into Part 1 (repeating _ 
words) and Part 3 (repeating a story). Part 1 

was reliabie in all sites except Texas^English . 

Part 3 was relia^ble in all sites, ^ ?. 

• Arm Coordination and Leg Coordination were reliable 
in only six and five sdtes respectively. It was 
possible to achieve reliability for some sitps by 
combining the six Arm and the six Leg items into 
one scale. 

m Do You Know?, a site--specif ic measure^ administered 
in only two sites, was reliable in West Virginia, 
but not in Florida. 

It should be noted that reliabilities and validities were 
calculated separately for the English and Spanish versions 
of the measures, the Calx fornia-Spanish and Texas-^Spanish 
sites representing the Spanish versions* 

In order to summarize the level of internal consistency 
for the measures in the total sample, the values of the 
reliability estimates (Cronbach alpha) for each measure 
were computed for the total samples of English- and Spanish-^ 
speaking children, combined across sites (see Table 3)^. 
These alphas were calculated on the basis of ^11 items in 
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Table 3 



Estimates of Reliability of the Child Measures, 
Based on Cronbach's Alpha (Internal Consistency ) a 

for Fall 1975 \Data 
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PIPS^Locus of Control 




203 


X 


0 
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POCL (High/Scope) 




719 


.90 


87 


.87 



^The samples consisted of all PDC and comparison children across all 
sites- Alphas were calculated separately for English^ and Spanish- 
dominant children. All measures eKcept the POCL (a rating scale) 
had an English and a Spanish version. 

^The bilingual children within the English-^ and Spanish^dominant 
samples received both the English and Spanish versions of the BSM. 
Monolingual children received only the version appropriate to their 
group. 
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each measure, i.e., no attempt was made to "boost" them 
as was done at the site level of analysis. The alphas 
for the English versions of the measures were found to 
be very similar to those for the Spanish versions. As 
will be noted in later tables, the validities also tend 
to be very similar. 



Validity of the instruments 

When an instrument was. accepted as reliable within 
a site, a total score on the measure was calculated for 
each child to whom the measure was administered in that 
site. The validity of the measure was then evaluated for 
that site. 

The method of evaluating validity for the purpose 
of this report is based on the concept of concurrent 
validity. The instruments were selected to measure specific 
aspects of a child's social competence. Those presented 
here focus on three areas of social competence: Cognitive 
Language, Psychomotor, and Social-Emotional. A convergent- 
discrimihant method of assessing validity was used; under 
this method the assumption is made that if an instrument 
is actually measuring the construct it was Intended to 
measure, the results will correlate highly with other 
measures, of the same construct, will correlate moderately 
with measures of similar constructs, and will not correlate 
at all with measures of independent constructs. 

An hypothesized correlation matrix was developed 
(Table 4) Which set an expected range of correlation values 
for each pair of measures based on the similarity of the 
constructs they are supposed to.be measuring. In general, 
higher correlations were expected within the three areas of " 
social competence than between the areas, but a degree of 
overlap between the areas was also expected. Actual inter= 
correlations were then calculated within each site for the 
measures that were judged to be reliable m that site (or 
lor which reliability could not be calculated). The correla 
tions were compared with the expected correlations m the 
manner describe'd in Chapter II. ^ 

Table 2 summarizes the decisions made regarding the 
validity of the measures (in addition to the reliability . 
Most instruments which were judged to be reliable= were also 
found valid. The exceptions were these: 
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• The combination of Arm and Leg Coordination was 
not valid in Arizona, Georgia, or within the 
Texas^English group. 

0 In. Arizona, only Draw-A-^Child was valid. 

• In the California^Spanish group the BSM-Spanish 
does not appear to be valid, but the sample size 
is small, making validity more difficult to 
establish* 

• In the Texas--English group (another small 
sample) only the BSM-English and Verbal Fluency 
appear to be valid, 

• Although no expected correlation values were 
stated for the Internal Locus of Control scoring 
of the PIPS, the consistent low negative correla- 
tion of this measure with all other measures 
suggests that it is probably not measuring the ^_ 
focal construct* 

Table 4 shows the hypothesized correlation values, 
and a summary of the site-level validity (correlation) 
matrices appears in Tables 5 and 6, Tables 5 and 6 show 
the obtained correlations f or . the total samples of English^ 
and Spanish-speaking children across all sites (for children 
in sites where the measures were reliable)* The N on which 
the correlation is based appears below the correlation 
value. The cells in which the obtained correlation value 
falls within the hypothesized range are demarcated by 
heavy lines. In qeneral, many of the measures had very 
satisfactory levels of validity; even when the cor: ^lations 
were not in the expected range, they tended to be close* 



Comparability of PDC and Comparison Groups 



Information on background (demographic) characteristics 
of PDC and comparison children was collected, where available,- 
to provide a basis for selection of the final analytic ^ 
samples and to permit examination of the demographic similarity 
of the two groups. Children with handicaps that interfered 
with valid testing were excluded from analysis, as were 
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Table. 5 

intercorrelations of Child Measures^ for English-Dominant C 
Combined Across Groups and Sites for Pall 1975 Data 
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Tabla 6 

Intercorrelations of Child Measures^ for Spanish-Dominant Children, 
Across Groups and Sites for Fall 1975 Data 
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children at non^bilingual sites whose primary language was 
hot English; it was judged inappropriate to include the 
test scores of these two groups of children in the analysis 
because it seemed unlikely that their scores could be 
interpreted in the same way as those of other children - 
(Factors other than basic aptitude can complicate the 
performance of handicapped children, and the Spanish-translated 
tests administered to Spanish^dominant children cannot be 
presumed to be., a priori , equivalent to the English versions.) 

Only at two sites, California and TeKas, were there 
enough Spanish--dorninant children to constitute statistically 
adequate Spanish--speaking samples. In these sites child 
data were grouped separately for English and Spanish speakers . 
Because of the sparse nunibers of Spanish-dominant children 
at other sites, such children were eKcluded from site^level 
analysis rather than being incorporated into a special 
sample . 

Once the final analytic samples had been established, 
analyses were performed to determine just how comparable the 
PDC and comparison groups really are at each site. Both 
background characteristics and test performance were examrned 
and the results are shown in Table 7. Note that comparisons 
of the full samples, prior to exclusion of hardicapped 
children and non-English speakers, are presented first 
because it is not otherwise possible to compare proportions 
of PDC and comparison children on the variables of l^andicap 
and Language Dominance. Analyses of ethnic proportions 
were done both for the full and the analytic samples since 
exclusion of non-English .speakers could artificially produce 
the appearance of ethnic similarity between treatment groups. 

At the level of the analytic sample, the background 
variables eKamined represent characteristics that have been 
found in, past research to be related to school performance. 
If the groups are not initially comparable on these dimensions, 
it is possible that the effects produced by PDC will be masked 
by extraneous differences unless these differences are somehow . 
taken into account. 

For each site and for each variable ^appearing in Table 
7, the' assumption of equal PDC and comparison group means 
was tested statistically (using the chi'-square technique for 
categorical variables and t tests for metric variables) * 
All available data entered into each analysis, meaning that 
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Table 7 

Comparability of PDC and Comuarison Groups at Each Site 
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even if data were missing for a particular child on one or 
more variables, data obtained for that child on other 
variables did enter into the respective analyses. A 
difference was declared to exist between PDC and comparison 
groups if analysis indicated the chance probability of the 
observed difference to be less than one in ten (p < .10). 
This criterion was ^judged to strike a , balance between the 
need to be sensitive to small differences (which might be 
spurious) and the need to be fairly confident that any 
single differance found statistically significant is real 
and not just a random occurrence* In other words, it is 
.important to be able to detect group differences in, say, 
ethnicity at a particular site, but it is also important 
not to declare a difference whei^e none really exists. 



Site -Level Findings 



The asterisks in Table 7 mark statistically significant 
differences botween PDC and comparison groups. If the 
analytic samples were perfectly matched, one could expect 
to find about ten significant group differences reported in 
the middle section of the table (in 10% of its 99 full cells) ; 
instead, the groups were found to differ in 28 instances. 
Some of these differences are more serious than others----ethnicity 
is invariably found to be an important background factor in 
studies of school effects; thus if PDC and comparison, groups 
are not balanced ethnically , analysis of PDC effects in the 
future can be confounded; on the other hand, differences on 
the single variable. Number of Siblings, would not be expected 
to distort program effects very pov/erfully (but if a difference 
exists on this variable in conjunction with differences on 
Mother's Employment Status and Father's Presence/Absence, 
it is reasonable to presume that the groups are not equal 
socioeconomically , and this would be considered to be an 
inequality of some consequence) • 

The number of differences on performance measures shown 
in Table 7 is smaller than might be expected in view of 
the observed background dif f erences^-chance variation would 
result in about ten significant test differences , and only 
three more than that were found. This may mean that the 
background variables measured are relatively weak in their 
effects at this point or^ that the noted group imbalances on 
those variables are not great (even if statistically 
significant) . 
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Because of the numerous demographic dtfferences dis- 
covp-red between PDC and comparison groups at many of the 
sites, the orospect of conducting site-level analyses is 
uncertain at this time. Group imbalances can often be _ 
cont^-olled statistically so that they do not confound analysis 
of treatment effects, but this is much easier with large 
samoles than with small ones. Thus the feasibility of 
anaiyaing data aggregated across all sites was eKplored, 

Aq g r e g a t^e - Le v e 1 Fin din g s 

The fact that background data were not available for 
panv rhildren and that the tests were not all administered 
at all sites makes it impractical to add all the child data 
together to form a totally inclusive aggregate. For illus- 
t»a*44£G Durposes, thoaqh, all children with complete data 
'"oT seven' major performance measures were pooled into ag|re 
gate Pnc and comparison qroups with a combined size ol 6j4. 
p-iguro 2 shows the distributions of children m these 
qrouns on certain backqround variables and Figure 3 shows 
th-ir relative standing on seven performance measures plus 
two more background variables. At the aggregate level, 
the similarities of the qroups are more prominent than 
their differences. Although PDC children are likelier 
to be b]ack and to have attended preschool before this year, 
there are no other demographic- differences and only one 
dift=»rencp in test performance (that one, as can be seen, 
is quite small in the perspective of the range of obtained 
scores) . 

The diffcronces found at the site level' appear not to 
consistently favor one group over the other, since they 
tend to disappear when the data are aggregated (except on 
the variables of Ethnicity and Prior Preschool Experience). 
Thus in performing future impact analyses, it appears that 
it will be feasible to aggregate data across sites to 
"smooth out" most of the imbalances that may exist and to 
control any remaining i.Tibalances by statistical means. 



S ample Size Roguirements and Sample Availability 



1 n 



,.,1974 and 1975, sites were asked to submit data 
documentinq attrition rates from grade to grade (beginning 
in' Head Start) in designated PDC and comparison schools. 
Based on the data provided, estimates were made of the 
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Backyround Charactorist ics of Ayyregated SamplGs 
of PDC and Comparison Children 
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Mian Standard Scores at Aggreqated Samples of PuC and Comparison Children 
on Salacted Darnoyraphic and pGrformance Miasures 



P ^ Medn q£ PDC children (N ^ Ju?! 
C * H^an of Comparison Ghildron (N % 
I - High/lQw seofi in c^int^d groui-'B 



(P'-.IOI? 



ana 



NO 



Number of 
SibUngi 

Pupil 

Observation 
Checklist 



NO 



NO 



BSM-Engltsh 



NO 



Draw'A-Child 



NU 



Block Design 
(WPPSI) 



NO 



Verbal Fluency 



NO 



Verbal Memory- 
Part 1 



Verbal Memory- 
Part 3 



1^ 



MO 



YES 



-1 0 I 

z-Score Units 



(Biied on score distributions of combined Groups) 



ERIC 



ft ' 

6', 



Head Start sample sizes necessary at aach site to ensure 
that at least 30 children v/ould remain in each of the two 
groups through the third grade (the terminus of the pros-- 
pecti\''e longitudinal study) , These estimates were first 
published in interim Report II, Part A (19/5) and are 
reprinted here', with some revisions, in Table 8, The table 
also shows each site*s most recent projection of PDC and 
comparison group sizes for fall 1976 (the children entering 
at that time, Cohort 2, will be the focal samples of the 
Impact Study). It is evident in Table 8 that many of the 
sites do not eKpect to enroll the specified numbers of 
children. In cases where the number enrolled next fall 
turns out to be within 10-^20% of the requirement, it might 
still be possible to perform analyses of program effects 
within site, depending on the degree of gioup comparability; 
in cases where the numbers are 10-20% low and the groups ^ 
depart from comparability, the sampleB might still contribut 
usefully to a cross^site aggregation of PDC and comparison 
groups, upon which the analyses could be carried out. But 
in cases where enrollment for the coming school year falls 
drastically below the requirement, the utility of continuing 
the child testing phase of t)ie Impact Study is questionable- 
there will be no possibilitY of a within-site analysis, 
and very small samples are as likely to complicate an 
analysis at the aggregate level as to facilitate .it. 
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Table 8 



Head Start Samples Required and Samples Projected 
at Each Site for Fall, 1976 







PDC Class 


es 


Comparison Classes 




Required 


Number 
Expected 


Range of 
Expectation 


Nuinber 

f^fe ^ y A i 14 


Number 


High-Jjow 
Range of 
Expectation 


ARIZONA 


60 


22 


20-^30 


60 


22 


20-30 


CALIFOBNIA 


45 


47 


45^50 


4 5 


57 


55-61 


COLORADO 


GO 


68 


68-^68 


SO 


68 


68-68 


CONNECTICUT 


75 


60 


45-^60 


75 


60 


45-60 


GEORGIA 


60 


60 


50-60 




— 


_____ 


FLORIDA 


7 5 


45 


30-50 


75 


37 


30-40 


IOWA 


60 


60 


45-75 


75 


65 


60-70 


MARYLAND 


, 5 


70 




75 


60 




MICHIGAN 


60 


75 


75-75 


75 


60 


60-65 


NEW JERSEY 


60 


70 


70-70 


60 


45 


45-4 5 


TEXAS 


7 5 




^ U D U 


75 


45 


40-50 


UTAH 


65 


65 


35-70 


75 


65 


40-80 


WASHINGTON 


75 


60 


60-60 


75 


100 


92-100 


WEST VIRGINIA 


4 5 


45 




4 5 • 


45 
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RECOMMENDATIONS AND CONCLUSIONS 

The evaluation activities reviewed in this report were 
desi<?ned to result in recommendations regarding the Impact 
Study component of the Developmental Continuity evaluation. 
The recommendations outlined in this chapter are based on 
two sets of data collected over the past year and on other 
considerations affecting the overall quality of the evalu- 
ation. The first set of data consists of information on the 
suitability of the instruments. Alterations in the battery 
based on analysis of the fall da^a are summariEed in the 
^irst section* of this chapter. The second set of data 
includes information on the number of children to be served 
by the programs and expected sample attrition, comparability 
of PDC and comparison Ilead Start groups on background- charac- 
teristics, and initial comparability of groups on child 
performance measures. Recommendations stemming from these 
sources' of information are presented in the second part 
of this chapter. 

Other considerations have also been taken into account 
in preparing the recommendations. These include the tentative 
nature of some of the sample characteristics, the contribution 
of individual sites to an understanding of the implementation 
process, and the importance of a comprehensive impact Study 
that examines parent, teacher, and institutional effects as 
well as child impact. 



Suitability of the Instrurnents 



Usefulnes^^f__th^e Pre sent Battery 

The instruments that provided the data for the present 
phase of the Impact Studv were selected expressly because of 
their apparent relevance to the areas of social competence 
that PDC is designed to affect. Analyses of the internal 
consistency reliability and concurrent validity of the 
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instruments indicate that most of them, at most or^the sites, 
seem suitable for assessing the behaviors of interest. 
Table 9 shows the decisions that have been made, based on 
these analyses, regarding future use of the instruments. 
Half the total number of instruments- examined were judged 
adequate for future use with little or no further modification 
in content, administration, or scoring. While it is possible 
that continued efforts v/ill be made to refine this first 
qroup of instruments, the task of refining those in the 
second group has immediate priority; the problems encountered 
with these instruments were judged to be soluble for one 
or more of several reasons • 

e The instrument was judged to be reliable and valid 
in at least some sites, suggesting that the circum- 
stances that caused it to fail at other sites may 
be correctable v/ith further modifications r 

# The instrument failed uniformly across sites due 
to circumstances that are correctable (e.g., 
confusing instructions) ; 

m The instrument represents an attempt to tap 

important behaviors that develop later, so better 
results may be expected in the future as the 
children mature. 

The three scales excluded from future use were found either 
to' provide no useful discrimination among children (due 
to an extremely high percentage of children obtaining high 
scores) or to present difficulties in administration or 
scoring for which there are no acceptable solutions. 

There is ample reason to believe that the set of instru- 
ments now proposed will be capable of detecting growth along 
some of the social cbmpetence dimensions of greatest interest. 
The battery as it is now constituted appears strongest m 
the cognitive-language area. It is augmented by a classroom 
observation system that focuses on social/interactive dimen- 
sions and other measures discussed in the following section. 



Plans for Future Refinement of the Battery 

Two instruments were administered in fall 1976 that ha 
been described (see Appendix A) but not discussed elsewhere 
in this report. The PDC Classroom Observation System and t 
PDC Child Rating Scale are being developed especially for 
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Table 9 

CQnclusions Regarding Suitability of the Instruments 



. Measures to be Retained without Change , These, measures 
were judged reliable and valid in all sites and will continue 
to be used in the evaluation with little or no modifications 

Bilingual Syntax Measure ^ English 
Bilingual Syntax Measure - Spanish 
.Bloak Design' (WPPSI) 
Verbal Fluenoy (MSCA) 
Verbal Memory] Parts 1 and 3 (MSCA) 

Dvaw-A-^Child. (MSCA) - . ^ ' 

' POCL (High/Scope Foundation) 

Measures to be Retained Proviaionally . These measures 
are being^ modif led to" correct problems detected during data 
collection or during data analysis relevant to reliabilrty 
and validity. They will continue to be used in the evalu-^ - 
ation until' reliability and validity are reevaluated^ 

Cnna&ptual Grouping (MSCA) 
Say and. Tell (CIRCUS) 
Arm Coordination (MSCA) 
Leg Coordination (MSCA) 
PlPS-Solutions 

Measures t:n be Discontinued from the Evaluation . These 
measures were found to be inappropriate for the age levels 
spanned by the evaluation: 

Block Building (MSCA) 
PIPS-Loaus of Control^ 

St@phens~De.ly8 Reinforaement Contingenoy Interview 

Measures to be Retained for Further Development . Two 
measures wfll continue to be retained: 

PDC ClasBroom Obaervation Syfttem 
■ ' 'PDC Child Rabiny Suale ,. ' ' 



lln the use of the PIPS, it is only the Locus of Control scoring 
procedure that will be discontiijued ; the PIPS will continue to 
be used, but with only the "solutions" score. 



33 

ERIC 



purposes of this evaluation (unlike the other instruments 
in the battery, which had been tested previously in their 
present forms), and it was deemed inappropriate ■ to present 
premature reports on their, psychometric properties or to 
judge group comparability on the basis of the scores they 
provided. These instruments will be administered' again in 
spring 1976, however, ind their utility will be reviewed 
in. the next interim report. If they are judged to be reliable 
and valid, they will provide additional perspectives on the 
development of children's social competence, particularly 
in the social-emotio.nal area. 

Other promising, instruments will be used experimentally 
in the future with an eye to complementing the present battery, . 
or to replacing complex instruments with simpler ones. (For 
example, a measure of productive language will be pilot-tested 
in three sites this spring, and a measure- of children's attitude 
toward school will also be obtained.) In addition, future 
analyses of test data will attend to the possibility of 
eliminating instruments that yield redundant -information . 
Although it is valuable to provide j for some redundancy of 
measurement as a way of certifying that the measurements 
obtained are valid, it is also important that the PDC evaluation 
not intrude on the program any more than is strictly necessary. 
Thus, in the child testing phase of the evaluation, where 
assessment activities directly involve the child and the 
teacher, special efforts are being made to maKimize efficiency 
of measurement. . 



Recommendations Based on juitabillty of the Samples 

Since the current three-year evaluation effort is con- 
ceptualized as a feasibility study for conducting a five-year 
longitudinal impact evaluation, the six recommendations made 
here are concerned primarily , with procedures that will best 
provide the' essential information for judging the potential 
for a' longitudinal study. There Is not sufficient information 
at this time for making those long-range projections, but on 
the basis of fall 1976 data, reconunendations for the longi- 
tudinal study will be made in March 1977. 

Although the central focus of these recommendations is 
the Impact Study, and although this report comprises child 
impact data, the recommendations are made within the context 
of the entire Developmental Continuity project and the 
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comprehensive evaluation of that project. As a Head Start ■ 
demonstration program, PDC is designed to demonstrate the 
linkage of Head Start and elementary programs to achieve ■ 
greater continuity. An understanding of how this happens 
i's critical to future decisions about programs of this nature, 
Thus, the process evaluation of PDC was seen by the original 
planners in the Office of Child Development as having an 
importance equal to that of the Impact Study. in fact, 
about fifty percent of the total evaluation effort is devoted 
to the process evaluation, including an analysis of the 
initial planning process, description of start-up and imple- 
mentation activities, and assessment of the extent to which 
prograrns are successful in implementing the total .concept 
of PDC (see Figure 1, p. 3) . 



The PDC Impaot Study should be continued at all 
fourteen si.teB. 



When all factors considered in this report are taken ■ 
together with overall considerations surrounding the, evalu- 
ation there seems to be every reason for retaining all sites 
for participation in the Impact Study. Within the Impact 
Study, the assessment of impact on children is just one facet, 
albeit a very important one. The goals of PDC relate to 
chariges in parents and teachers and to changes in the schools 
as institutions. These are important goals, and procedures 
have been developed for assessing program impact m these 
areas as well. 

Even if the assessment of child impact were impossible 
(which does not appear to be the case), there would still be 
value xn continuing the Impact Study. It is also felt that 
each site contributes important dimensions to the assessment 
of the implementation process and that the capacity of the 
evaluation to answer questions about relationships between 
implementation and impact will be enhanced by making use o£ 
all possible data. Consideration should also be given to 
examining child impact by means other than testing m sites 
where testing is judged to be infeasible. 



2. Child ten'ting RhoulcJ he nontinued at all but two of 
the &it&e * 



The conclusions regarding child testing at each site, 
are sununarized in Table 10. After considering the factors 
of sample size and attrition, comparability of PDC and com^ 
parison groups, and suitability of the measures at each site, 
the fourteen sites clustered into four categories that 
describe the nature and extent of the problems with one or 
more of these factors* 

Three sites seemed to have few problems: California- 
English, Colorado and ''Washington. If these sites continue " 
with current plans for Head Start enrollment and if the 
attrition, rate does not substantially increase, there should 
be little difficulty in conducting within-^site analyses of 
the impact of PDC on child social competence* 

In six sites (Georgia, lowa^ Maryland^ .Michigan, Utah, 
and West Virginia) there is little problem with the expected 
sample size, but differences between the PDC and comparison 
groups raise some concerns about the feasibility of within- 
site child impact analyses. The following summarizes the 
situation at each site: 

m Georgia--^the apparent imbalance on background 

characteristics is based on a comparison of Head 
Start and elementary children, since there is no 
contemporaneous comparison group; the elementary 
school enrolls a wide range of children, including 
those who would not meet Head Start eligibility 
requirements* The possibility of selecting from 
the total elementary sample a subset that best 
matches the PDC Head Start sample will be explored, 

• Iowa---group differences were found in the analytic 
sample on three background variables; there was a 
higher proportion of whites in the comparison group 
than in" PDC and a higher proportion of PDC children 
t had previous preschool experience; the groups 
differed on only one performance measure. 



Tabli lb 



Sumary of RscoMendations for Continuing the Evaluation of Child Impact 
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^Only m children could bi aggregatid' because PDC and comparison childran are not distinguishsd 
,,at thi'Head Start level. " ^ 

ErJC' awlysi^^oul'^^ not include the compirison . children. ' ' > ^ 



• . Maryland — ^the full sainples differed on important 

background characteristics (the PDC group had a 
higher proportion of black ohildren, a lower propor- 
tion of Spanish-speaking children, and a higher 
proportion of whites) ; in the analytic sample, all 
of the PDC children had prior preschool eKperience 
while less than one-third of the comparison group 
had prior preschool. There were no group differences 
on performance variables , 

• Michigan-'-the analytic samples differed on four 
background variables; the PDC group had a smaller 

'proportion of black children, the children had more 
siblings, mothers were more likely to be employed, 
and families were more likely to have fathers 
present; the group differences in SES and ethnicity 
present more serious problems for this site than 
for any of the others* 

• Utah==dif ferences were found on three^ background 
variables and on one performance measure; the PDC 
children were slightly older than the comparisons, 
their mothers were less likely to. be employed, 
and families were more likely to have fathers 
present * 

• West Virginia-"dif ferences were found on two back^ 
ground variables and -two performance measures; the 
PDC group had a higher proportion of boys than 
the comparison group and PDC families were less 
likely to have fathers present* 

As explained in Chapter III, these imbalances appear to 
be largely corrected when sites are aggregated, but for 
within-site analyses to be feasible, a closer match on back- 
around and performance variables should be obtained for 
Cohort 2. Recommendation 4 addresses procedures for achieving 
improved PDC-comparison matches. 

At two sites {California-^Spanish and TeKaS'-English) 
estimates of likely attrition and the projected numbers of 
children to be served by the programs raise some concerns 
for the possibility of a five-year longitudinal study when 
the sample at each site is divided into Spanish- and English- 
dominant children, California has a small proportion of 
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Spanish-dominant children, whereas in Texas, English-dominant 
children are in the minority. Because of .the importance 
of the bilingual/bicultural demonstration' pro jects , it is ^ 
recommended that testing be continued in these groups, even 
though it may be necessary to aggregate data from the two 
sites to achies^e sufficient sainple sizes for a child impact 
study (see also Reconmendation 6) , 

' Problems with both group differences and projected 
sample size were found for three of the sites (Connecticut, 
New Jersey, and Texas-Spanish) i 

m Connecticut"PDC-coinparison differences were found 
on three background characteristics in the analytic 
sample; the PDC group had a higher proportion of 
Spanish-'Speaking children, a larger proportion of 
children who had prior preschool experience, and 
PDC families were more likely to have fathers 
present; there were no group differences on any 
of the performance measures. Although attrition 
suggests the need for 75 children in each group 
next fall, funding levels will permit the program 
to serve only 60 children in each group* Attrition 
is compounded by the problem of tracking comparison 
children to 12 different; elementary schools. 

• New Jersey='-group differences were found on , four 
background characteristics and one performance 
measure; the PDC group had a lower proportion of 
boys and a larger proportion of children who had 
prior preschool; PDC families were less likely to 
have a" father present and PDC children had more 
siblings* Problems with sample si^e are greatest 
for the comparison Head Start group. 

• TexaB'-Spanish--'group differences were found on one 
background characteristic and two performance measures 
and the sample size required is considerably larger, 
than the number of children it is possible for the 
prografh to serve* 

Finally, there are two sites where a combination of 
factors leads to the conclusion that continued child testing 
would not be, the best use of evaluation resources* In 
Arizona, expected sample sizes mre extremely small, but even 



more important^ adequate measures for assessing social 
competence of Nava jo--speaking children simply are not 
available. In Florida, although there were relatively minor 
differences between groups on background characteristics and 
on one performance measure, the key consideration in this 
recommendation is the sample size. The program is designed 
to serve a small number of children and project attrition 
over a five-year period appears to be high, (It should be 
pointed out that, although school records indicate high^ 
attrition between kindergarten and third grade, recent 
information from the program indicates that school^wide 
attrition may not accurately reflect attrition within the 
Head Start migrant population* ) At each of these sites, 
it would be best to focus the evaluation effort on those 
activities (e^g*, interviewing parents, surveying teachers, 
assessing implementation) that can be most responsive to 
the unique characteristics of ithe site* 



3. Where sample siae has b&mn identifi&d as a .problem^ 
siteB should be enaourag&d to inarBase the numbei^ of 
\ ahiidren far fall 1976 in tine with the numbers 
reaommended ^ 



Sample size is likely to.be a problem in at least five 
sites, but the evaluation could be improved at all sites if 
larger numbers of children were enrolled* At the same time, 
sites should be encouraged to provide more ^-ecent and more 
reliable data on attrition from Head Start to third grade to 
permit the most realistic assessment:^Jfonecessary sample size. 



4^ A high priority should be set on enoouraging sites to ' 
recruit ahildr&n in a way that will maximise the match 
between the PDC and comparison Head Start ahiidren. 



The data presented by site in Interim Report III should 
be used to guide recruiting decisions, andg whenever there are 
more children available than can be served^by the program, 
children should be enrolled in such a way as to balance PDC 
and comparison Head Start programs on background characterist 
It is also recommended that High/Scope staff work with CCD 
staff to develop procedures to assist the sites in following 
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this recommendation. Sites should also be encouraged to 
assist in maintaining complete records on the children 
enrolled so that information on background characteristics 
will be as complete and accurate as possible. 



The 'aomparability of PDC and oompaTiaon groupB at 
each Bite must be reevaluated whan fait 1976 data 
are available . 



The importance of the match on certain background charac- 
teristics cannot be overemphasized. Factors such as SES, 
prior preschool experience and ethnicity are known to relate 
to performance measures, and it cannot be assumed, a priori, 
that statistical adjustments will restore balance completely. 
Problems of unmatched samples are especially critical where 
sample sizes are small, as is likely to be the case with 
the expected attrition in most of the sites. 

If a site shows a small degree of imbalance on a small 
number of variables, a site-level analysis of children's 
test scores might still be possible so long as PDC and 
comparison samples are of adequate size. Larger imbalances 
on a larger number of variables for a given site may prohibit 
site-level analysis, but with adequate sample size, the site s 
data may be usefully pooled with data from other sites for 
an aggregate analysis.. However, if a site has extreme problems 
with both comparability and sample size, it is unlikely that 
child test data from that site can contribute usefully to 
any analysis. After examining fall 1976 data, the feasibility 
of continuing the longitudinal study can be assessed. 



Speaial attention should, be given to providing addttional 

support to the two bilinqual/bioultural demonBtrat%on . 

sites that saTva both Spanish- and English-speaking ah%ldren. 



PDC is unique among national demonstration programs in its 
attempt to seriously address the needs of bilingual/blcultural 
children, and considerable effort has been devoted to developing 
both implementation and impact assessment procedures that are 
sensitive to the special characteristics of the programs and 
their children. At one BL/BC program (Colorado) there appears 
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to be no difficulty with projecting withifi--site analyses of 
child impact, but an evaluation of this site ^one would not 
provide an adequate test of BL/BC programs, s^ce all children^ 
are English dominant* The other two sites present some 
difficulties, partly because of insufficient numbers of 
children in each language group* At current and projected 
sample sizes, within'-site analyses would only be possible 
for the English-^speaking children in California; data from 
California and Texas would have to be pooled in order to 
assess impact on Spanish^speaking children* Although this 
may present conceptual problems because of differences among 
sites in the populations served- and in the nature of the 
bilingual programs, the Office of Child Development should 
determine whether pooling data from these two sites would 
provide the necessary information on program effectiveness. 
If not, additional resources should be found to enable 
California and Texas to serve a larger number of children, 
if larger populations of eligible children exist who are 
not currently enrolled. 



Conclusions 



Two general conclusions emerge from the findings of this 
report* First, the child measures pilot-tested in fall 1976 
are basically suitable for assessing the impact of PDC, at 
least at the Head Start level. By dropping a few measures, 
modifying others and pilot testing two additional measures 
this spring, it is believed that a sound measurement battery 
can be achieved. 

The second conclusion is that the methodological problems 
assessed in this report, while real, are still manageable. 
If the recommendations can be followed (particularly with 
respect to reducing PDC-'Comparison group differences) there 
is every indication that a successful evaluation of the 
impact of PDC on children's social competence can be completed. 
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APPENDIX A 
Descriptions of the Measures Selected 
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APPENDIX A 



Descriptions of the Measures Selected 

The tes±_r^yl_ewjproces resulted In the selectiori of the 
following instruments ^ 

So oial- Emotional Measures 

PDC Classroom Observation System 

Preschool Interpersonal Problem-Solving Test (PIPS) 
Pupil Observation Checklist (POCL) , r^s 

Stephen'-Delys Reinforcement Contingency Interview (B^D) 

Ps ijOhornotoT M& aBur&B 

Arm Coordination [McCarthy Scales of Children's Ability 
(MSCA) ] 

Leg Coordination (MSCA) 
Block Building (MSCA) 
Draw-A-Child (MSCA) 

Cognitive and Languag& Measures 

Block Design (WPPSl) 
Block Design (WISC) 
Conceptual Grouping (MSCA) 
Opposite Analogies (MSCA) 

Verbal Memory (MSCA) ^ 
Verbal Fluency (MSCA) 
Do You Know./*? (CIRCUS) 
Say and Tell (CIRCUS) 
Bilingual Syntax Measure (BSM) 

Other Mea Bur&B 

PDC Child Rating Scale 
Height and Weight 
Adult Language Check 
Demographic Information Sheet 
Wepman Auditory DiBcrimijnatlon Test 

Each of these measures is described briefly below. For 
a more extensive review, seellnterlm Report II, Part Bi 
p^^^mm^nrlRtlQns for Measuring Program Impact (1975). 
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PDC Classroom Observation System (High/ Scope Fo undation , 
unpublished) , The PDC observation system was developed to 
provide informatioh about children -s classroom behavior along 
dimehsions pertinent to the social-emotional goals of Project 
pevelopmental Continuity* The system focuses on aspects of an 
individual child's behavior, verbal or nonverbal, that reflect 
the child's attitude toward himself, and on the child's social 
competence as demonstrated in his Interaction with peers and 
adults. 

Using a time sampling method, trained observers observe 
each child for five minutes at four different times during the 
day and code their behavior into eight general categories. 
These categories includes "noninvolved, " "involved," "interacts 
with peer," "uses peer as resource," "interacts with adult," 
' "uses adult as resource," "eKpresses pride in personal achievements 
or attributes, ^" and "dramatic' play , " A ninth category , "activity 
level,", is included to provide information concerning the 
contextVin which these behaviprs were observed. Each of these 
categories includes subcategories that are designed to identify 
the frequency and nat^ure of specific behaviors within the general . 
category* 

Preschool Interpersonal Problem-SolVing Test (Shure and 
Spivack, 1 9 7T) ^^he" FTPS "attempts to assess the child's ability 
to naW aiternative solutions to a lif e--related problem^^that of 
obtaining a toy from another child* Paper cut^outs^ of boys, 
,^irlB and toys are used in presenting the problem* Among inner- . 
%ity four-year-olds attending the ^Philadelphia Get Set day care 
program, those judged as better^adjusted by their .teachers were 
able to conceptualize a greater number tfnd a wider range of 
alternative solutions to real-life problems than were their more 
- poorly adjusted classmates* 

Pupil Observa tion Checklist (High/Scope Found ation , unpublishe 
This is~^a rating scale consisting of eleven 7-point bipolar 
adjectives derived from a similar scale used in the Home Start 
evaluation* The items arA dividt^ into two subscales, Child's 
Testing Behavior and Problem'-So.lvlng Behavior , and ^ the^ ratings . 
are all completed by the tester -^af^er he or she has administered 
all the other measures in the battery to a child* * , " ^ 

Stephens-Delys Reinforcement Go ntingenc y Interview (Stephens 
and Dely s, 1973)* This instrument seeks to measure the extent 
tbwhicha child believes that the behavior of others around 
him is contingent on his own behavior. The .instrument was 
shortened to a'version consisting of 12 items that ask\questions 
such as* "What makes teacher^ smile?" Responses are coded 
internal if answered in a way that indicates attribution of 
control, to oneself (e,g, , "When I***") and eKternal iff the answer 
suggests that the behavior is under the control of someone else 
(e*g*, "When Daddy***")* 



McCarthy Scales of Children's Abilitie s (McCarthy, 1972)_ . 
These subtests cons ists of " series of tasks tapping problem- _ 
solving, psychomotor, and conceptual abilities, and are similar 
to the Wechsler scales, but with emphasis on age-related 
maturational indicators. The particular McCarthy subtests used 
in the Impact Study are listed by category at the beginning of 
this section. 

Wechsler Pr eschoo l and Primary S ca le of Intelligen ce, 
Block Design ( subtes^t r (Wechsler^^J. 9 6 7 )_. The task requires 
reDroducing (constructing) designs with flat colored blocks, 
either from the examiner's model or from a picture on a card. 
The measure taps problem-solving abilities, flexibility of 
response styl visual-motor organization, and execution. 

Wechsler Ir^t o lligence Scale for Children, Block Design 
(subti st) (wec'hsle^, 1949) . The task consists of reproducing 
(constructing) designs with colored blocks (cubes) , either 
when modeled by the examiner or when presented on a card. The 
measure taps problem-solving abilities, flexibility in response 
style, visual-motor organization and execution. 

CIRCUS subtest; Do You Know...? (Educational ' Testing 
Serv ice, 1974) . This "Xs a general information test. The child 
chooses the oicture which appropriately answers the examiner's 
question. This task taps the child's experience in a variety of 
areas (health, safety, social standards, consumer concepts). 

Circus subtest: Say and Tell (Educational Testing Service, 
1974)^ This test consists of two >arts and taps children's 
TaHiUage abilities. In the first part the child is given a 
'pencil and asked attribute questions, e.g., "What color is it? ; 
in the second part the child is given two pennies and is asked 
to describe them. Scoring is based on categories of attributes 
which the child mentions. • • ^ 

, 1975) 



in'standard English and/or standard Spanish grammatical structure 
Simple questions are used with cartoon-type colored pictures 
to provide a conversational setting for eliciting natural speed. 
An analysis of the child's response yields a numerical indicator 
and a qualitative description- .of the child's structural language 
proficiency in standard English or standard Spanish. Responses 
are written dov/n verbatim. 



Bilingual Syntax Measure (Burt, Dulay and Hernandez C 
test^isdesignod to measure children's oral proficie 



ncy 



s 
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PDC Chi ld Rating Scale (Hig h /gcQpe Foundation , unpublished ;_ . 
This instrument is desiqned as a measure of social competence 
to be administered bv the resp.-ctive classroom teachers of the 
children rated. For^ each of tiie 39 items, specific behaviors 
such as "Uses words or wits' to influence others" are rated on 
a 5-point scale according to frequency of occurrence ("Very 
frequently" to "Rarely) . Summation of the ratings yields 
two "aggregate measures: interpersonal competence and task 
competence ( learning- to-learn ability) . 

weight and Huight . These two items ot data are collected 
for arr^hlldren in'the sample. In most cases the tester 
personally weighs and measures the child, although in some 
instances Head Start records were used to avoid duplication 
of effort. 

Adult Langu age Check . This measure is used in the 
bilinglial/bicurturar demonstrat ' on sites to obtain an indication 
of the languages the adults in the classroom use during their 
Lnteractions with children. The interviewer sits in the ^ 
\ classronm for a two hour period and records une language used 
'by the teachers and aides approximately every five minutes. 

Demo qrahic Information Sheet . Demographic information 
such as years of mother's education, presence of handicap, 
previous preschool experience, number of siblings, occupation 
of parents, etc. was collected primarily from the Head start 
records. 

Wepman Aud itory Discrimination Test . This instrument 
tests Chi Ldren ' s libi 1 ity to discriminate between sounas. 

Not all the above measures were used in all PDC sites. 
After the basic battery was selected, each site was asked 
whether any of the five optional measures (Wepman Do You Know . 
(CIRCUS), How Much and How Many (CIRCUS), How Words Work (uIRCUo) , 
and Opposite Analogies (MSCA) were related to their specific 
goals and obiectives and, if so, whether they wanted them included 
in the batterv for that particular site. Four sites did request 
at- least one site-specific measure. Information on whicn 
measures were administered in each site is given in Tables A-1 
and A-2. 
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InstrmentB Adniinistcrcd to PDC and Coninarison Students at Each Site 
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Site-Specific Instruniantg, Choien Locally 
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APPENDIX B 



Flow Charts for tha Analysis Procedure 
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Figure B-l 

Overall Plow Chart for Basic Data Analysis 
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Flow Chart lor Step 1: Measure Reliable? 
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Figure B-3 

Flow Chart for Step 2: Are the Measures Valid? 



2A 

Detefntine (te sir- 
able r value 
ranges between all 
ffltasures. 

T 





21. 






Are any 


YES 




litGg left 






to analyzo? 





2D. 

Havg two BL/BC 
and six non-BL/BC 
Sitis been 
analyzed? 



NO 




/ 



/ 3C 



-J.N 



Finished With \ / 

^tep 3 f'-^f tnoie 
gitos already 

^0 to Lh / 




T ^ TabJu tor thii^ «:iperdtlon 

Actiori decisions 
Go to 
^ = YES/JIO flows 
- Manditciry flews 

ERIC 






2F, 






Are any 


YES 




measartl 






left to 






analyzg? 





Go tQ 2H, 




Do r valuer fall 
within desired 
rang^: for that 
miisure? 




VES 



MeasQi!} is vilid 
for this Site. 
Computfc^ total 
I Cure . 




2K. 

Correct reason 
for nsxt t^^^ting 
but nifiit froni 
base UnQ 




Is tiierc a 
riison for lack 
of Vdildity? 




5 



2L. 

Ofiiir nvtaiuri 
\ t-^im further 
cnnF^iderat ion for 




Figure B-4 

Flow Chart for Step 3:" Are the Groups CoiTiparable? 
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